Uniform clip sampling + multi-clip sampling function by themattinthehatt · Pull Request #57 · neuroinformatics-unit/poseinterface

themattinthehatt · 2026-05-29T19:30:22Z

Description

What is this PR

Bug fix
Addition of a new feature
Other

Why is this PR needed?
Current repo only has a single clip selection strategy (manual).

What does this PR do?
Adds a uniform clip sampling strategy option, and updates the CLI to take sampling strategy as input.

References

Builds on #39.

How has this PR been tested?

Added new unit tests, checked all tests are passing locally before making PR.

Is this a breaking change?

Yes; the previous clip extraction entry point was extract-clip; this has been updated to extract-clips to work with multiple clips; in addition, the command line args changed to allow for multiple clip sampling strategies.

Does this PR require an update to the documentation?

Minor update renaming the clip extraction entrypoint from extract-clip to extract-clips.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

sfmig

Hi @themattinthehatt, thanks for this!

I made a few suggestions: two of them are two options to try to keep the kwargs within static type checking; the other one is a decoupling suggestion that I think can help simplify tests and has other benefits too.

I think the decoupled approach would be my preferred one in terms of clarity, but it may be a personal preference too. I leave you all the options, let me know what you think. I requested changes in case you want to go for any one of them, I'd be happy to have another look. I haven't checked the tests in detail for the same reason.

sfmig · 2026-06-08T14:44:07Z

+    if args.sampling == "uniform":
+        if args.num_clips is None:
+            raise SystemExit(
+                "error: --num_clips is required when --sampling uniform"
+            )
+        kwargs["num_clips"] = args.num_clips
+    elif args.sampling == "manual":
+        if not args.start_frames:
+            raise SystemExit(
+                "error: --start_frames is required when --sampling manual"
+            )


These checks are enforced twice: once in main and once in extract_clips. It makes sense to have a CLI-friendly error (i.e. no traceback, to report wrong inputs) and an API-friendly one (with traceback) but we can reduce code duplication a bit more with the suggestion below, that "translates" the API error to a CLI one.

(It assumes the first suggestion for the extract_clips signature above was applied)

def main(args: argparse.Namespace) -> None: try: extract_clips( args.video_path, args.duration, args.sampling, num_clips=args.num_clips, start_frames=args.start_frames, ) except ValueError as e: raise SystemExit(f"error: {e}")

Or if we go for the dataclass approach, the purpose of main shifts to instead catch None input cases (rather than simply translate the error):

def main(args): if args.sampling == "uniform": if args.num_clips is None: raise SystemExit("error: --num_clips is required when --sampling uniform") sampling = Uniform(num_clips=args.num_clips) # or for the decoupled approach we can directly have: # extract_clips_uniform(args.video_path, args.duration, args.num_clips) elif args.sampling == "manual": if not args.start_frames: raise SystemExit("error: --start_frames is required when --sampling manual") sampling = Manual(frames=args.start_frames) # or for the decoupled approach we can directly have: # extract_clips(args.video_path, args.duration, args.start_frames) # the following call would not be needed in the decoupled approach extract_clips(args.video_path, args.duration, sampling)

I like the first option better; the dataclass approach feels overly complicated to me (at least at this point)

sfmig · 2026-06-08T15:14:45Z

+def extract_clips(
+    video_path: Path,
+    duration: int,
+    sampling: str,
+    **kwargs: Any,


Ideally, the function signature would be explicit about the different keyword arguments, rather than managing this via the docstring. That way we don't lose the static type checking on those inputs (i.e. right now mypy won't flag num_clips if it is not an int).

One option to do this would be to define all possible inputs and set them to None by default, then branching based on that:

def extract_clips( video_path: Path, duration: int, sampling: Literal["manual", "uniform"], *, num_clips: int | None = None, start_frames: list[int] | None = None, ) -> list[tuple[Path, Path | None]]: if sampling == "manual": if start_frames is None: raise ValueError("sampling='manual' requires start_frames") frames = list(start_frames) elif sampling == "uniform": if num_clips is None: raise ValueError("sampling='uniform' requires num_clips") n_frames = sio.load_video(Path(video_path)).shape[0] frames = _uniform_start_frames(num_clips, duration, n_frames) else: raise ValueError(f"Unknown sampling strategy {sampling!r}") return [extract_clip(video_path, sf, duration) for sf in frames]

Claude also suggested an alternative approach using dataclasses that may be more convenient if we expect the number of sampling strategies/input parameters to grow. Something like the snippet below:

from dataclasses import dataclass @dataclass class Uniform: num_clips: int def start_frames(self, duration: int, n_frames: int) -> list[int]: return _uniform_start_frames(self.num_clips, duration, n_frames) @dataclass class Manual: frames: list[int] def start_frames(self, duration: int, n_frames: int) -> list[int]: return list(self.frames) Sampling = Uniform | Manual def extract_clips( video_path: Path, duration: int, sampling: Sampling ) -> list[tuple[Path, Path | None]]: n_frames = sio.load_video(Path(video_path)).shape[0] frames = sampling.start_frames(duration, n_frames) return [extract_clip(video_path, sf, duration) for sf in frames]

To call it, you would do extract_clips(p, 100, Uniform(num_clips=5)). This one feels a bit more complex but has two nice benefits:

you cannot make a Uniform instance without num_clips, so we can do away with the "API" validation / error handling

if we add more sampling strategies (which I think we will), we don't need to touch extract_clips

I agree about being explicit with keyword args for improved type hints/checking, good idea 👌

sfmig · 2026-06-08T15:46:51Z

+    return [extract_clip(video_path, sf, duration) for sf in start_frames]
+
+
+def _uniform_start_frames(


The current approach in extract_clips bundles together input preparation (i.e. computing start_frames based on the sampling strategy) and the action (i.e. the extraction of the frames). The same applies for the suggestions I gave. This makes a more convenient function call, but the coupling of these two objectives has some downsides.

We can decouple it (initially this is what I had in mind). The extract_clips function can take start_frames as input, and then we have several functions that generate these start frames (e.g., the current _uniform_start_frames). For convenience, we can have wrappers that we can wire to the CLI. These wrappers combine extract_clips with a start-frame-production function:

# "core" function, only extracts def extract_clips( video_path: Path, duration: int, start_frames: list[int], ) -> list[tuple[Path, Path | None]]: # potentially some validation here return [extract_clip(video_path, sf, duration) for sf in start_frames] # wrapper, prepares the input. We link this to the CLI def extract_clips_uniform(video_path, duration, num_clips): n_frames = sio.load_video(video_path).shape[0] frames = _uniform_start_frames(num_clips, duration, n_frames) return extract_clips(video_path, duration, frames)

This decoupled approach has a few benefits:

simpler and smaller unit tests (because each function does one thing)

it scales nicely to other strategies (contributors need to add a "start-frames" function and then write the thin wrapper that collects the call; extract_clips stays the same)

the dispatch that translates a string (uniform) into a function (_uniform_start_frames) now occurs at the CLI main (it is removed entirely from here). We would need to keep the None-guards in main as mentioned above tho.

I think it would also help with the validation of inputs:

Right now, start_frames values aren't validated as a list of positive integers anywhere (each frame is only checked individually inside the loop that calls extract_clip, which is late to catch the error). This is easy to miss with the current approach I think, because validation is a bit spread across different places. The start frame validation would belong nicely within the decoupled extract_clips, and that way we can report all bad frames at once instead of dying on the first.

Claude suggests to use a small validation function, to use in extract_clip and extract_clips. But in the multi-clip approach, we would validate the inputs up-front.

def _validate_clip_request(start_frame: int, duration: int) -> None: if start_frame < 0: raise ValueError(f"start_frame must be non-negative, got {start_frame}") if duration <= 0: raise ValueError(f"duration must be positive, got {duration}")

P.S.: Claude says "argparse can enforce the conditional requirement via subcommands". It would look something like this:

extract-clips uniform --video_path v.mp4 --duration 100 --num_clips 5 extract-clips manual --video_path v.mp4 --duration 100 --start_frames 0 500 1000

I have never done this, and I think I prefer the argparse to have no "important" logic, but just in case you want to look into it.

another great suggestion @sfmig! I also agree that I don't want argparse performing this logic, it's simple in the manual case but we'll likely have more/more complex logic checks for other sampling strategies, so this should be handled in extract_clips_*.

sfmig · 2026-06-08T16:09:45Z

+            "Supported strategies: 'manual', 'uniform'."
+        )
+
+    return [extract_clip(video_path, sf, duration) for sf in start_frames]


Maybe to avoid confusion between both functions:

Suggested change

return [extract_clip(video_path, sf, duration) for sf in start_frames]

return [extract_single_clip(video_path, sf, duration) for sf in start_frames]

themattinthehatt · 2026-06-11T16:48:07Z

@sfmig thanks for the great feedback! I incorporated many of you suggestions. I also imagine at some point we'll build a proper CLI module and move wrapper()/main()/parse_args() out of this module. Is that what you had in mind, or do you want to keep all of this functionality in one place?

I have one test failing: https://github.com/neuroinformatics-unit/poseinterface/actions/runs/27362615831/job/80853717159?pr=57. Do you know why this is the case? I didn't touch this module on my side, is this related to some update in movement?

niksirbi · 2026-06-11T17:51:54Z

I have one test failing: neuroinformatics-unit/poseinterface/actions/runs/27362615831/job/80853717159?pr=57. Do you know why this is the case? I didn't touch this module on my side, is this related to some update in movement?

Ah yeah, that's because of the breaking change of dimension names in movement v0.17.0. To unblock the ongoing PRs, we could temporarily pin movement to v0.16.0 here and then we can follow up with a separate PR bringing the codebase up-to-speed with movement v0.17.0.

niksirbi · 2026-06-11T18:24:01Z

@themattinthehatt, I've quickly fixed the movement issue (it was an easy fix).

If you give #59 a quick approval, we can merge that first and then rebase your 2 PRs to main (that would be the 'cleaner' fix).

themattinthehatt added 5 commits May 29, 2026 14:58

extract_clips and uniform sampling

4131d87

extract tests

6284c23

multi-clip extraction with CLI

43a0039

cli tests

299145a

update docs

a33a6b5

themattinthehatt requested a review from sfmig May 29, 2026 19:30

sfmig requested changes Jun 8, 2026

View reviewed changes

themattinthehatt added 3 commits June 11, 2026 12:20

explicit kwargs in extract_clips

93aa58c

refactor clip extraction logic

d05174b

increase test coverage

f034db7

extract_clip->extract_single_clip

4c34cb8

themattinthehatt mentioned this pull request Jun 11, 2026

Add Lightning Pose conversion example #56

Merged

7 tasks

update api

de93ba2

themattinthehatt added 2 commits June 11, 2026 14:06

fix tests

06a92b9

pin movement for tests

97601bf

niksirbi mentioned this pull request Jun 11, 2026

Pin movement to >0.17.0 and rename dims to singular #59

Merged

7 tasks

Merge branch 'main' into clipsampling

5b91a15

themattinthehatt requested a review from sfmig June 12, 2026 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniform clip sampling + multi-clip sampling function#57

Uniform clip sampling + multi-clip sampling function#57
themattinthehatt wants to merge 13 commits into
mainfrom
clipsampling

themattinthehatt commented May 29, 2026

Uh oh!

sfmig left a comment

Uh oh!

Uh oh!

sfmig Jun 8, 2026

Uh oh!

themattinthehatt Jun 11, 2026

Uh oh!

sfmig Jun 8, 2026

Uh oh!

themattinthehatt Jun 11, 2026

Uh oh!

sfmig Jun 8, 2026

Uh oh!

sfmig Jun 8, 2026

Uh oh!

themattinthehatt Jun 11, 2026

Uh oh!

sfmig Jun 8, 2026

Uh oh!

themattinthehatt commented Jun 11, 2026

Uh oh!

niksirbi commented Jun 11, 2026

Uh oh!

niksirbi commented Jun 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return [extract_clip(video_path, sf, duration) for sf in start_frames]


		def _uniform_start_frames(

	return [extract_clip(video_path, sf, duration) for sf in start_frames]
	return [extract_single_clip(video_path, sf, duration) for sf in start_frames]

Conversation

themattinthehatt commented May 29, 2026

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

sfmig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

themattinthehatt commented Jun 11, 2026

Uh oh!

niksirbi commented Jun 11, 2026

Uh oh!

niksirbi commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

niksirbi commented Jun 11, 2026 •

edited

Loading